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Abstract. An objective operational theory of probabilistic parametric in- 
ference is formulated without invoking the so-called non-informative prior 
probability distributions. 
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1. INTRODUCTION 

We make a probabilistic inference about a pa- 
rameter of a family of the so-called direct proba- 
bility distributions by specifying a probability dis- 
tribution that corresponds to the distributi on of our 
belief in different values of the parameter ( Jeffreya 
(|l957l ). § 2.0, p. 22). The probabilistic parametric in- 
ference is characteristic of Bayesian schools of sta- 
tistical inference (as opposed to frequentist schools), 
where the name Bayesian is due to the central role 
of Bayes' Theorem in the process of inference. In the 
Bayesian paradigms, it is also possible to make state- 
ments concerning the values of the inferred parame- 
ters in the absence of data, and these statements can 
be summarized in the so-calle d {non-informa tive) 
prior probability di stributio n s, (Villegasl ( 198ll ): see 
also, for example. IJefFreva (Il96lh . §1.4, p. 33 and 

1.6, pp. 30-31; 
3.5, p. 86: 



3.1, pp. 117 -118; [Ferguson 



Bergerl iwm. 



O'HaganI (Il994). U .21. p. 23: iKass and Wassernmnl 



iLadl (Il996l'l 
p. 193; [Robert 




1.2, pp. 4-5: iRaol (119931^ ■ 



3.4 



Shao 



(|l999l ). 



p. 150; 
toOVi ). §3.5, pp. 1 27-140: 



Casella and Bertrerl (120021'). S 7.2 . 3, p 324: I.TavneeJ 
(1200311 ■ S4 1 pp 87-88nHarnevl (|2003l l. §2.1, p. 9; 



Hogg et al.l ((20051), §11.2.1, pp. 583-584). The non- 
informative prior distributions provide a for- 
mal way of exp ressing i gnora. nce about the in- 



ferred parameter (IJeffrevd (|l96lh . §3.1, pp. 117-118; 



Kass and WassermanI (j 19961 ). §4.1, p. 1355). It has 
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been asserted (jjeffrevsl (119571'). ?2.3 . p. 31: JjefFrevd 
(ll96l|), §1.5, pp. 36-37; iB ernardgl (|l979l '). §5.1, 
p. 123 : iKass and Wasserman, (,1996 ). §4.1, pp. 1355- 
1356; iRobertI (l200lh . S3.5. p. 127) that there is no 
objective, unique non-informative prior distribution 
that represents ignorance. Instead, the priors should 
be chosen by public agreement, much like units of 
length and weight, upon which everyone could fall 
back when the prior information about the inferred 
parameter is missing. 

In the present article, a theory of probabilistic 
parametric inference is developed without invok- 
ing the non-informative prior probability distribu- 
tions. Moreover, it is demonstrated that the non- 
informative prior probability distributions necessar- 
ily lead to inconsistencies. Sections [2H1] are devoted 
to formulation of a mathematical theory of proba- 
bilistic parametric inference. In particular, in Sec- 
tion [21 the notions of probability, of (direct) prob- 
ability distribution, of parametric family and of in- 
variant family are introduced. In addition, some of 
the properties of probability distributions are briefly 
reviewed. In Section [3l the so-called inverse proba- 
bility distributions are defined. It is demonstrated 
that the inverse probability distributions must be 
directly proportional to the appropriate direct prob- 
ability distributions. The proportionality factors, 
called consistency factors, are determined in Sec- 
tion |4] on the grounds of invariance of parametric 
families of direct probability distributions under the 
action of Lie groups. In Section [H the concept of rel- 
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ative frequency and the concept of degree of belief 
are introduced that hnk the probabihty distributions 
to an external world of measurable phenomena. In 
this way, the mathematical theory becomes opera- 
tional. Also in Section [SI as well as in Conclusions, a 
reconciliation between the Bayesian and the frequen- 
tist schools of parametric inference is advocated. 

2. PROBABILITIES AND PROBABILITY 
DISTRIBUTIONS 

2.1 Notation and general definitions 

In this section, the notions of probability and of 
probability distribution are introduced, and some of 
the properties of probability distributions are briefly 
reviewed, with special attention being paid to con- 
ditional probability density functions. The purpose 
of refreshing these well known concepts is to avoid 
misunderstandings in subsequent sections where the 
properties of probability distributions are exten- 
sively invoked and the definition of of the conditional 
probability distribution is extended. 

Let ri be a non-empty universal set, also called 
a sample space, whose elements are denoted by uj. 
A set S of subsets A,B,C . . . of the sample space 
is called a a-algebra (or a-field) on if S has 0, 
as a member, and is closed under complementa- 
tion, ^ € S; Vvl G S, and under countable union, 
J2'^iAi e H; yAi,A2,... G S (throughout the 
present discussion, A + B, AB and A — B denote 
a union, an intersection and a relative complement 
of sets A and B, respectively, while A = Q. — A). An 
ordered pair (O, S) consisting of a state space Q. and 
a (T-algebra S on fi is called a measurable space. 

Example 1 (Borel algebra). Let be M"-. The 
Borel a-algebra (or Borel algebra) B^ on M" is the 
minimal u-algebra containing a collection of open 
rectangles in M". It is also said that the Borel algebra 
B"^ on R" is generated by all open rectangles in M". 
Every set from a Borel algebra is called a Borel set. 



axioms 



due to \Kolmoaorou 1(1 933 ): 



(1) 
(2) 

(3) 
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for all Ai,Aj G S that are mutually exclusive, i.e., 
AiAj^i = 0. Then, the triple (i7, S, P) is termed the 
probability space. 

Definition 2 (Random variable). Given a 
probability space {Vl,Ti,P), let a function X : i7 — > 
M be S -measurable.' Ax<x = {uj £ Q, : X{iu) < 
x} G S, Vx G M. Then, X is called a (real-valued) 
scalar random variable (or random variatej, while x 
is called a realization of X . 

Definition 3 (Distribution function). Given a 
random variable X on a probability space {Q,T,,P), 
the (cumulative j distribution function (cdf) Fx{x) 
is a real-valued function on the state space M to [0, 1] 
such that Fx{x) = P{Ax<x)- 

Every cdf is a non-decreasing function with 
Fx(-oo) = lim^.^_oo Fx{x) = and 



(4) 



Fx{+oo) = lim Fx{x) = 1 . 



Definition 4 (Continuous random variable) . A 
random variable X is called continuous if its cdf 
Fx{x) is absolutely continuous, i.e., if the cdf is ex- 
pressible as an integral of a non-negative (Lebesgue) 
integrable function fx{x), called probability density 
function (pdf): 



Fx{x) 



fx{x')dx' . 



The support of a continuous random variable X is 
a set, say Vx, of all x for which fx{x) > 0. 

Due to dl]), a pdf is always normalized to unit 
area, 



Definition 1 (Probability). Let P be a real- 
valued function on a a-field T. on a sample space Cl. 
We call P a probability measure (or simply a prob- 
abilityj if it is congruent with the following three 



(5) 



+ 00 



fx{x')dx' 



Vx 



fx{x')dx' = 1 



Two pdf's correspond to the same cdf precisely if 
they differ only on a set of Lebesgue measure zero. 
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On the other hand, a cdf of a continuous random 
variable is difFerentiable alm ost everywhere on M 
(|Stein and Shakarachil (jiooi), §3.2, Theorem 3.11, 
pp. 130-131) such that the derivative can be used as 
a pdf. 

Definition 5. Throughout the present discus- 
sion, 



(6) 

is assumed. 



fx{x) = ^Fxix) 



Definition 6 (Probability distribution). A 
function Prx : B — > [0, 1] called probability dis- 
tribution is defined as the image measure of P by 
the random variable X, Prx = P o X^^ , such that 
Prx{S) = P[X-^{S)], where X~^S) G S is the in- 
verse image of a Borel set S under X. A probability 
distribution over a continuous random variable X is 
called a continuous probability distribution. 

From the properties of the underlying probability 
spaces it follows immediately that probability distri- 
butions for random variables also conform to the ax- 
ioms ([HIS]) of probability. Therefore, a scalar random 
variate X on a probability space (fi, S, P) generates 
another probability space (M, B, Prx) with the Borel 
algebra B = B^ as underlying a-algebra. 

Let X and Y be continuous random variables de- 
fined on (r2,E,P), let there exist a function s on 
Vx such that Y = s o X and y = s{x), and let 
the function s be differentiable with non-vanishing 
derivative s'{x) on the entire support Vx of X, such 
that [s~^(y)]' = [s'(x)]~^ exists for all y = s{x) 
with X € Vx- Then, due to the common probabil- 
ity space {VL, S, P) underlying the spaces (M, B, Prx) 
and (M,;B,Pry), 



,B,Prx) 




B,PrY) 



(17,S,P) 

for all y for which s~^{y) € Vx the cdf Fy for Y can 
be expressed in terms of Fx as 



(7) Friy) 



Fx{s-\y)) ; [s-Hv)]' > 

i-Fxis-Hv)); [s-Hy)]'<o 



and the pdf for Y is related to the pdf for X as 

(8) fviy) = j^Fy{y) = fx{s-\y))\[s-\y)]'\. 

The image of Vx under s is contained in Vy, siVx) ^ 
Vy, and the probability distribution PrY\VY — 
siVx)] for the relative complement of Vy and s{Vx) 
is zero. 

The foregoing discussion about the probability 
distributions associated to scalar random variables 
is extended to multivariate random variables as fol- 
lows. 

Definition 7 (Random vectors). Given a prob- 
ability space (r2,S,P), a vector function X = 
(Xi,...,X„) is called a multivariate random vari- 
able (or random vector^ if Ax<x = {w G ri : 

Xl{ijj) < Xl, . . . , Xn{i0) < Xn} G S, Vx = 

(xi, . . . , Xn) G K". Every random vector gives rise to 
a cdf F:x_{xi, . . . , Xn) on the state space M" to [0, 1] 
such that F^{xi, . . . ,x„) = P(Ax<x)j o^nd to a joint 
probability distribution Prx. (S) on the Borel algebra 
S" to [0,1], Prx (5) = P[X-^S)], S E B''. Also, 
as for the scalar random variates, a random vector 
X is called continuous if its cdf can be written as an 
integral of a pdf fxixi,..., x„), 

Fx{xi,...,Xn) = / fji.{tl,...,tn)dtl---dtn 

dti- ■ ■ I dtn f:K.(ti, . . . ,tn), 

-OO J — CO 

where C/x<x = x"=i(— oo,Xi] is an infinite n- 
dimensional rectangle in the state space M", while 
the transition from a n- dimensional integral to n it- 
erated integral s is jus ti fied by Fubini 's Theorem (see, 
for example, iBartlei 1(1 96a ). Chapter 10, pp. 119- 
120). 

Every (joint) probability distribution for a con- 
tinuous n-vector X can be expressed as an integral 

Prx{S)= f fx{^)d^^; ySeB^. 
Js 

Let X and Y be n-dimensional continuous ran- 
dom variables on a probability space (J7,S,P), let 
/x(x) be a pdf for X, and let s be a differen- 
tiable function on Vx with non-vanishing Jacobian 
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|5xs(x)| such that Y = s o X. Then, for ah y from 
the image of Vx_ under s, the pdf for Y reads: 



(9) 



/Y(y) = /x(s-^(y))iays-^(y 



Definition 8 (Marginal distributions). Let a 
random vector X be partitioned into a random n- 
vector Y and a random m-vector Z, X = (Y,Z). 
Then F^{y) = Fy^{y,zi = oo,..., Zm = oo) and 
Fx(z) = Fyi{yi = cx), ■ ■ ■ ,yn = oo,z) are called the 
marginal cdf's for the components Y and Z of the 
partition (Y,Z) o/X, respectively. Also, pdf's 



Z-i. The function Pr^'^~^{U\'L) : M™ — > M called 
conditional probability distribution for Y given the 
value Z = z, is then defined by the set of functional 
equations: 

(10) ?i^_,^^^(5)= /PrJI^=^(^|z)(z)dPr|(z) , 

while the corresponding conditional cdf is denoted by 



pY|Z=z, 



,y|zj- 



Y|Z=z 



The definition of Pr^ {U\z) can be interpreted 
to say that the diagram 



and 



fliy) 



fli^) 



/x(y,z)d"'z 



/x(y,z)d"y 



LY-^f/) 



are called the marginal pdf's for the components Y 
and Zi of a partition of a continuous random vec- 
tor X, while the corresponding marginal probability 
distributions are denoted by Pr^{U) and Pr^{S), 
U eB"" andS e B"". 

Usually, abbreviated notations may be used, e.g., 
-?^x(y) = -^x (y) and /x(z) = /x(z). Since, however, 
in -Fx(y) and in /x(y) the arguments of the func- 
tions denote also the functions themselves, it should 
be noted that -Fx(y) and /x(y) are not necessarily 
the same functions as -Fx(z) and /x(z), respectively. 

Definition 9 (Conditional probability distribu- 
tions). Let {Q,Ti,P) be a probability space and 
X = (Y,Z) : [7 — ^ M" X M'" a J^-measurable 
function that gives rise to a probability distribution 
Pr^ : B"" X B"" — > [0,1], let (M",i3",Pr^) and 
{W^,B"^,Pr^) be the spaces of the marginal proba- 
bility distributions for the components Y and Z of 
the partition (Y, Z) o/X, and let 1y-i([/), U S B"", 
be the indicator function on il.: 1y~i([/)(w) = 1 for 
uj G Y^"'^([/) and otherwise. Then, a function 
^i^-.^u) : S' ^ M, S' ^ Z-i(S") C S, 

S G S™, is a finite measure on S', and so is finite 

the image measure v\ , of the measure v-\ 

by Z, ui , : ^™ — > M, ?i , = i^i , o 




PrV^-^{U\7.) 



is commutative in the average with respect to Pr^. 



Definition 

Y|Z=z 



10 (Conditional pdf). Let 
Pr^^~^{U\7,) be a solution of (|10p . For con- 
tinuous X, the system of equations 

„Y|Z=z 



:ii) 



Pr^^-\U\z) 



u 



n 



(y|z)d"y 



for all U ^ B^ , is the defining condition for the con- 
ditional pdf fy^ ~^ for Y given Z = z. 

For conditional cdf's and pdf's, abbreviated no- 
tations Fx(y|z) = Fx' """(ylz) and /x(y|z) = 



/■ 



Y|Z=z, 



y|z) may again be used. 



Proposition 1. Let f^{y,z) be a joint pdf for 
a (n -\- m)- dimensional random vector X = (Y,Z) 
and let /x(z) be the marginal pdf for Z, supported 
on Vz. Then, 



(12) 



/x(y|z) 



/x(y,z) 
/x(z) 



holds true uniquely on (M" — Uq) x (Vz — ^o), where 
Prl (So) = ULiUo) = 0, ul{Uo) = J^^ d^y. It is said 
that /x(y|z) is determined uniquely Pr^-almost ev- 
erywhere on Vz and z^j^-almost everywhere on M". 

Remark 1. First, the reason for adopting an in- 
direct definition of the conditional pdf's is that the 
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more direct formulations like, for example, the ap- 
proach t hat is based on the L'Hopital rule (see, for 
example, iRad ( 19931) , ^ 1.4 , pp. 13-14) and the ax- 
iomatization of lRenvil ( 19551 ). do not lead to uniquely 
defined conditional pdf's. For a discu ssion on the 
resulting inconsistencies see iRad ( 19931 ). Chapters 3 
and 4, pp. 63-121. Second, below, existence of a joint 
pdf /x(y)Z) is not a necessary condition for exis- 
tence of the corresponding conditional pdf's /x(y|z) 
and /x(z|y). 

Let there exist a conditional pdf /x (y, z|t), X = 
(Y, Z, T), and let the marginal distribution 

/x(z|t)^/ /x(y,z|t)(i"^y 

be positive. Then, by an iterative application of Def- 
inition [lOl 



(13) 



/x(y|z,t) 



/x(y,z|t) 



/x(z|t) 

The results of the following example are obtained by 
sequential applications of the product rule (I13p . 



Example 2. Let X be partitioned into 
(Y,Z,T,W) and let there exist conditional pdf's 
/x(y,t|z,w) and and /x (y,w|z, t). Then, in an 
analogy with (fT3]) . for /x (t|z, w) , /x (w|z, t) > 
there exists a conditional pdf /x (y|z,t, vi^) such 
that 

. . I , X /x(y,tlz,w) /x(y,w|z,t) 

Jx (y |z, t, w) = — = „ . , .. . 

/x(t|z,w) /x(w|z,t) 

When, in addition, the marginal pdf's 
/x(y|z,v^^) and /x(y|z,t) are also non- 
vanishing, the joint pdf's /x(y5t|z,w) and 
/x(y)W|z,t) can be further decomposed as 
/x(y,t|z,w) = /x(y|z,w) /x(t|y,z,w) and 
/x(y,w|z,t) = /x(y|z,t) /x(w|y,z,t), such that 

/x(y|z,w) /x(t[y,z,w) 



/x(y!z,t,w) 
(14) 

In the same way. 



/x(y|t,w) 



(15) 



/x(t|z,w) 
_ /x(y|z,t) /x(wiy,z,t) 
/x(w|z,t) 

/x(yiw) /x(tiy,w) 

/x(tlw) 
/x(y!t) /x(w|y,t) 



/x(wlt) 
is obtained when X is partitioned into (Y,T, W). 



Example 3 (Transformations of conditional 
pdf's). Let X = (Xi,X2) be a continuous (ni + 
n2)-dimensional random variable and /x(xi|x2) be 
a conditional pdf. Let, in addition, s : Vyi — > 
M"i X ]R"2 ]-,g a differentiable function function such 
that Y = (Yi,Y2) = soX = (810X1,820X2) and 
that the Jacobian |9x8(x)| = |(?xi8i(xi)| |9x2S2(x2)| 
does not vanish on the entire support Vx of X. For 
/x(s^ (y2)) > 0, equations ([8]) and ([9]) applied to 
the conditional pdf /Y(yi!y2) = /Y(yi,y2)//Y(y2) 
then yield 



(16) 



/x(sr'(yi),s2-Hy2)) 



/Y(yi|y2) = •"'"' )'''_i"'W'''" l^yiSi ^(yi)| 
/x(s2 (y2)) 

= /x(sr'(yi)|82-i(y2))|9y,8rHyi)|. 

During the present discussion we allow for a pos- 
sibility that a conditional pdf /x(xi|x2) exists even 
when the corresponding joint pdf /x(xi,X2) does 
not exist. When /x(xi,X2) does not exist, however, 
the transformation (J16p of the conditional pdf that 
is induced by the transformation of the random vec- 
tor, ceased to be uniquely determined. In order to 
dismiss this ambiguity, the following definition, mo- 
tivated by the preceding example, is adopted. 

Definition 11 (Transformations of conditional 
pdf's). Let there exist a conditional pdf f-x_{xi\x2) , 
X = (Xi,X2) and x = (xi,X2), and let a function 
8 : (xi,X2) — > (si(xi),S2(x2)) = (yi,y2) bc one- 
to-one and with non-vanishing Jacobian |9xiSi(xi)| 
on the entire support Vxi|x2 o//x(xi|x2). Then, the 
conditional pdf fY{y ily 2), Y = (81 0X1,82 0X2) = 
(Yi,Y2), is defined as 
(17) 

/Y(yi|y2) ^ /x(sr'(yi)Is2-^(y2))|5y,sri(yi)| , 

where 8j~2 o'^e the inverse functions of Si ^2- 

2.2 Parametric families of probability 
distributions 

The term parametric family is used to describe 
a collection / = {Prig : € V©} of probabil- 
ity distributions that differ only in the value of a 
(possibly multi-dimensional) parameter, say 0, i.e., 
a value of determines a unique distribution 
within /. Therefore, a probability distribution for 
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a random n- vector X, Prx (•S'x), ^x € ^", that be- 
longs to a particular parametric family /, is denoted 
by Prjff (S'x), whereas Fj q (x) stands for the corre- 
sponding cdf. Likewise, fj^e (x) denotes a unique pdf 
within a parametric family / of continuous probabil- 
ity distributions. A continuous probability distribu- 
tion from a parametric family / is supported on a 
set Vx = ^x(^) that may, in general, depend on the 
value 6 of the parameter, while the range V© C R*" 
of admissible values of is called a parameter space. 
In the present article, every considered parametric 
family is assumed to be identifiable: Prj^e^ ^ ^^/,02 
for 6>i / 6>2, ei^2 G V©. 

Example 4 (Reparameterization). Let fiQ^x) 
be a pdf for a random n- vector X from a parametric 
family / and let s be a one-to-one Borel function onto 
M" such that the Jacobian |5xs(x)| does not vanish 
anywhere on the support Vx{9) of X. Then, accord- 

ing to (HD, fp^eiy) = fiA^^'iy)) \dyS~Hy)\, where 
Y = s o X, y = s(x) and s^^ is the inverse function 
of s, while indices / and /' indicate that probabil- 
ity distributions for X and Y in general belong to 
different (but isomorphic) parametric families. Let, 
in addition, s be a one-to-one function on the pa- 
rameter space V© C R™, such that A = s{6). Then, 
fr,e (y) can be reparameterized as 



(18) 



fi",\{y) = //',s-i(A) (y) 



firs~HX){^^\y))\dyS-'\y) 



where s ^ is the inverse function of s. 

There is a complete analogy between the transfor- 
mation (jlSp and the transformations (J16p and (J17p . 
such that every probability distribution from a para- 
metric family can be regarded as a conditional dis- 
tribution, i.e., as a distribution that is conditional 
upon the value of the parameter. Accordingly, we 
define Fj {x\0) = F/,^ (x) and Pr/ (5©=0,x|^) = 
Pfi,e (Sx.), 5'©=e,x £ -S", and, for continuous X, 



(19) 



//(x|0) = //,0(x) 



for all X G R" and G V© C R™. 

Fj(x\6), Prj {S&=0^x.\^) and // (x|0) are un- 
derlain by a probability space {Qg,T,g, P) and by 
a {m + n)-dimensional random variable (0,X) : 
Qg — > (0,R") for all G V©, where every state 



space (0,R") is a slice on V© x R" that corresponds 
to a particular value of a ?7i-dimensional param- 
eter of the family /. The probability distribu- 
tions Prj (5'©=0^x|^) on Borel c-algebras B^ on such 
slices are called direct probability distributions and 
represent the first step towards a unified approach 
to random variables and parameters from paramet- 
ric families. The second step is made in Section [3l 
where the notion of the inverse probability distribu- 
tion is introduced. 



Remark 2. The results of Subsections 12.21 and 
I are independent of the preceding definitions. The 
only reason to define Fj (x|0) and // (x[0) already 
at this stage is to avoid unnecessary duplications in 
notation. 

Definition 12 (Independent random variables). 
When //(x|y,0) = fj{x\0) and //(y|x,0) = 
fj{y\0), the components X and Y of a continu- 
ous random vector (X,Y) are called independent 
random variables. When, in addition, //(x|0) and 
fl{y\0) are the same functions, the variables X and 
Y are said to have identical probability distribution. 

When the components X and Y of a random vec- 
tor (X,Y) are independent random variables and 
the joint pdf// (x, y\6) exists, the latter can be writ- 
tenas/7(x,y|0) = /,(x|0)//(y|0). 

Definition 13 (Location and scale parameters). 
Suppose a cdf for a scalar random variable X from 
a parametric family I is of the form 



(20) 



Fi{x\ii,a) = $ 



M 



where fi is a realization of the first component of a 
two-dimensional parameter = (0i,B2), whereas 
a is a realization of its second component. Then, 
©1 is called a location parameter and ©2 is called a 
scale parameter, while V© = R x R"*". 

When probability distributions from a location- 
scale family / are continuous, on the support 
VxiUjCr) of a distribution from the family the ap- 
propriate pdf is of the form 



(21) fi{x\fi,a) 



d 1 f X — fi 

-—Fj{x\n,a) = -0 

ax a \ a 
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where (j){x) = ^'{x). Except for x = fi, every pdf G}, ga(-) = l(a, •), andletY = gaoX. In addition, 
()2ip from a location-scale family can be written as let for every ga ^ Q o-nd every 6 E V© there exist a 
a sum transformation g^ : — > ga(^) = ^, such that 



fj{x\fi,a) = c+fj+{x\fi,a) +c^fj-{x\n,a) 



where 



X — fl 



c+fi+{x\n,cr) 



<0 



Jj{x\fi,a);^>0 



and 



. / I N _ jfi{x\fi,a) ;^ <0 
c_ f]-(x\ii,a) = < 



while 



(n) (iu and C- = (piu) du 

J —oo 



For c± > 0, there exist pdf's fj±{x\fi,a) 
fl{x\n, cr)/c±, which can be further reduced to 



(22) 



fj±'{y\Xi,X2 



I) 



1 i/-^i 
e ^2 



A2 

•^ily - Ai) , 



±e ^2 



where y = ln{ib(3; — //)} and Ai = In ex. That is, 
every scale parameter for a location-scale family / 
is reducible to a location parameter for a parametric 
family /^ . 

2.3 Invariant families of probability distributions 

Let G = {a,b,c, . . .} be a group whose unit el- 
ement is denoted by e and let 1 be a function on 
G X M" to M" satisfying l(e, x) = x, Vx € M" and 
l(a o 6,x) = l[a,l(6,x)], Va, 6 G G and Vx € M". 
Such a function specifies G acting on the left of 
R" and a group G = {ga : a € G} of functions 
ga : M" — > M", ga(x) = l(a,x). A composition 
of gaigb £ G corresponds to the composition of 

a,b e G, ga[(gb(x)] = {ga ogfe)(x) = gaob(x), ge 

is the unit elemer it in Q and g „-i = g~^, Va G G 
(see, for example, [Eato3 Jigsi), §2.1, pp. 19-20). 



Definition 14 (Invariant family). Let Fj(x\6) 
be a cdf from a parametric family I, let there exist a 
group G and a function 1 : G x R" — *• R" specifying 
both an action of G on the left of the state space W^ 
of the random n-vector X and a group Q = {ga '■ o- G 



(23) 



Fi>{y\X) = Fiiy\X) . 



where y = ga(x). The family I is then said to be 
invariant under the group Q (or ^-invariant or in- 
variant under the action of the group G) . 

Given a ^-invariant parametric family /, the set 
= {ga ■ ga ^ G} of the corresponding trans- 
formations on_the parameter space is also a group 
(JFergusonI (jlQGTI ). § 4.1, Lemma 1, pp. 1 44-145), usu- 



ally r eferred to as the induced group (jStuart et al. 
(|l999l ). §23.10, p. 300). 



Let elements of a group G be defined by the val- 
ues of n continuous real parameters (or coordinates) , 
e.g., a = 7(ai, . . . , a„) with 7 being a function on (a 
subset of) R" to G. The coordinates are essential in 
the sense that the group elements cannot be distin- 
guished by any number of coordinates smaller than 
the dimension n of the group G. Since, by definition, 
every group is closed under composition of its ele- 
ments, a o b = c G G, Va, b & G, the coordinates of 
c are expressible as functions of the coordinates of a 



and b, Ci = Ci(ai,... ,an]bi,... ,b„ 



1, 



, n. 



Example 5 (One-dimensional groups). Coordi- 
nates of elements of a one-dimensional group G also 
form a group G C R with ai o 61 = Ci(ai, 61) being 
the corresponding group operation in G. Therefore, 
since G and G are isomorphic, no generality is lost 
if a = 7(ai) = oi is assumed. 

When coordinates Cj of an element c = a o 6 of a n- 
dimensional group G are smooth (i.e., C°°) functions 
of the parameters of a and 6, G is called Lie group. 

Example 6 (Invariance of location-scale fami- 
lies). G = R X R"*" is a two-dimensional Lie group 
for the operations 



aob= {a2bi + 01,0262 



(24) 



Every location-scale family / ()20p of continuous 
probability distributions is invariant under the 
group 

(25) g = {ga: X ^a2X + ai; (01,02) eG} , 
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with 

(26) ga: (61,92) ^(0261 +01,0262) 

being the corresponding transformations from the 
induced group Q. The family / is also invariant un- 
der two one-dimensional subgroups of the group Q: 
under the group Qx = {ga ■ X — > aX;a € M+}, 
M+ is a one-dimensional Lie group for multiplica- 
tion and {ga : (61,62) — > {a@i,aQ2);ga ^ Gx} 
is the group induced by Qx , and under the group 
g^ = {ga : X ^ a + X;a e R}, with R being a 
one-dimensional Lie group for summation and with 
{ga : (61,62) — > (a + 6i,62);S'a G G+} being the 
corresponding induced group. 

Similarly, a family of continuous probability dis- 
tributions for random vectors that consist of two 
independent scalar random variables Xi and X2, 
both belonging to the same location-scale family I, 
is invariant under G = {ga : (^1,^2) — > (02^1 + 
ai,a2^2 + fli) ; (fli;Q^2) S M X M^}, while the cor- 
responding transformations from the induced group 
are again ([26]) . 



Lemma 1. Let G be a one- dimensional Lie 
group, let a function l{a,x) : G x R — > R give rise 
to a group G = {ga ■ o, € G} of transformations 
ga '■ R — > R, and let l(a,x) be differentiable both in 
a and in x,\la ^ G and Vx G M. Then, for all x ^R 
for which 



(27) 



dal{a \x)|^ 



vanishes, all group transformations are trivial, i.e., 
ga{x) =x for all ga € G. 

Clearly, if (j27p vanishes for all real x, then the 
action of the group G on the entire real axis is trivial: 
ga{x) = X for every ga & G and for all x G R. 

Lemma 2. Suppose a probability distribution for 
a continuous scalar random variable X belongs to a 
family I of parametric distributions that is invariant 
under the action of a one- dimensional Lie group G. 
Let, in addition, the left actions l{a,x) and /(a. A) 
be differentiable in a, x and A for all a ^ G, x ^ 
Vjjc(A) and A G V\{x), let the action of the group G 
not be identically trivial on the entire support Vx(A), 
and let the cdf for X , Fj{x\X), be differentiable in A 



(differentiability in x is guaranteed by Definition\D^. 
Then, the partial derivative 



(28) 



dal{a-\\)\a-_ 



does not vanish anywhere on the space Va of the 
(scalar) parameter A of the family L. 

Furthermore, for a continuous scalar random vari- 
able X whose probability distribution belongs to a 
family I of parametric distributions that is invariant 
under the action of a group G, equation ([7|) reduces 
to 
(29) 

Fi{l{a~\x)\l{a-\\)) ■[g-\x)]'>^ 



Fi{x\X) 



l-Fi{l{a-\x)\l{a-\\))-[g-\x)]' <Q 



a £ G. On the subspace Vx ^ Vx(A) 
with non- vanishing derivatives (j27p . derivatives 
dal{ci~^, X)\^^^ are non-zero by Lemma[2l Then, for 
X € Vx, differentiating (p9|) with respect to a and 
setting afterwards a = e yields 
(30) 
d^Fj{x\X) dxH{x, A) - dxFj{x\X) d,H{x, A) = , 



where 

(31) 

and 

(32) 

and 

(33) 



H{x,X) = s{x)-s{X) 
[s'{x)]-'^dah{a-\x)\^__ 
s'{X)]-'^dah{a-\X)\^^ 



Lemma 3. The cdf Fi{x\X) that solves the func- 
tional equation (|30|) is a differentiable function of a 
single variable H{x, A), 



(34) 



Fi{x\X) = ^H{x,X)] 



Consequently, the cdf Fj{x\X) from a parametric 
family / that is invariant under the action of a one- 
dimensional Lie group can be written as 

(35) Fj{x\X) = <!>[s{x) - s(A)] = cD(y - /i) , 

where y = s{x) and fi = s{X) have been introduced. 
Then, by equation d?]), the cdf for the continuous 
random variable y = s o X is of the form 



Fp{y\fj.,a = 1) 



'$(y-^) ■,[s-\y)]'>0 

My-i^) ■,[s-Hy)y <0 
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where $(y — /i) = l — ^{y — fj,). That is, the probabil- 
ity distribution for the continuous random variable 
Y belongs to a location-scale family /' with a = \ 
(recall equation (j20p ). and the above reasoning can 
be summarized as 

Proposition 2. Let X be a continuous scalar 
random variable whose probability distribution be- 
longs to a Q -invariant parametric family I, where 
= {9a '■ 0, G G} is underlain by a one- dimensional 
Lie group G. Let, in addition, ga{x) be differentiable 
for all X ^M. and let the cdf Fi{x\X) for X be dif- 
ferentiable in A. Then, on the subspace Vx ^ Vx(A) 
with non-vanishing derivatives (j27p . X is reducible 
by a one-to-one transformation s (I32|) to a contin- 
uous random variable Y ^ s o X whose probability 
distribution is from a location- scale family (]20p with 
o" = 1 and /i = s{X), where s is defined via (j33p . 



Remark 3. In the sequel (Proposition H]) we 
shall further demonstrate that for realizations x € 
VxW — Vx with vanishing derivative (f27ll . a pdf 
cannot be assigned to the inferred parameter of the 
family /. 

Let a continuous random variable X with a pdf 
fi{x\6) belong to a parametric family / that is in- 
variant under a group Q of differentiable transforma- 
tions g(j with non- vanishing Jacobian |9xga(x)| on 
the entire support Vx(^) of the distribution for X. 
Then, equation Q applies which, when combined 
with the definition (j23p of invar iance of a family /, 
yields 

//(y|A) = //(g-^(y)lg-^(A)) \dyg-\y)\ 

for all y = ga(x) such that x G Vx_{0), where A = 
ga{6) and g^ e Q. 

3. INVERSE PROBABILITY DISTRIBUTIONS 

Definition 15 (Inverse probability distri- 
butions). Suppose there exist probability spaces 
{Q,0,T,0,P), i^Q C il, for all 6 G V© and a random 
variable (0,X) : fig — > (0,M"') that together lead 
to the parametric family I of continuous direct prob- 
ability distributions Prj {S@=0^:s_\6), S'0=0^x G S^ 
whose pdf's are denoted by //(x|0). Let, in addition, 
for some of those realizations x o/ X for which 



there exist also probability spaces (ilxi 5]x, -P), f^x C 
$7, such that the function (0,X) : i7x — ^ V^&:^) is 
Ti^-measurable (i.e., ^®<0 = {uj ^ $7x : < 0} G 
Sx for all E Vq) and thus a random variable also 
on (fix, 5]x, -P)- Then, the probability distributions, 
resulting from the probability spaces (rix,5]x,-P) and 
from the corresponding random variable (0,X), are 
called inverse probability distributions. The cdf's 
and the pdf's that correspond to the inverse prob- 
ability distributions are denoted by Ff {0\x.) and 
fi{6\x.), respectively. 

Likewise, let (0,X) be further partitioned into 
(01, ©2, X) and let for some of those realizations 
Oi and X for which 



(37) 



Vei,e2 



fi{x\ei,e2)d"''ei>o 



there exist probability spaces {^02,. 
that the function (0i,02,X) 



-S02,x,-P) such 



n 



02, X 



On 



is Ti02^:x.-measurable, j4©^<0^ = {a; G 



(Vg)i,02it'2,X 

^0i,x : ©1 < ^'i} G S0,,x for all (6>i,6>2) G V&,^0,. 
Then, the cdf's and the pdf's that correspond to the 
resulting inverse probability distributions are denoted 
by Ff {9i\02,x) and fi {6i\62,x), respectively. 



(36) 



//(x|6>)d'^6'>0 



Remark 4. The integrals ((MI) and ([37]) need not 
be finite. The reasons for requiring the two integrals 
to be strictly positive will become apparent within 
the context of Proposition [31 below. 

Apart from the direct and the inverse probability 
distributions, their mixtures may also exist. For ex- 
ample, F/(0,Xi|x2), F/(0i,Xi|02,X2), //(0,Xi|x2) 

and // (01, xi 1 02 1X2) are the cdf's and the pdf's of 
two of the distributions that are neither purely di- 
rect nor purely inverse. 

From a mathematical perspective, the direct and 
the inverse probability distributions, as well as their 
mixtures, share identical properties, some of which 
were discussed in Section [2.11 The following three 
rules that apply to inverse probability distributions 
are obtained by invoking the equivalence between 
the two types of distributions. 

Rule 1 (Parameter transformation). Let 

fi{6\x.) be a pdf of an inverse probability distribution 
and let (s, s) : (0, X) — > (s o 0, s o X) = (A, Y) be 
a differentiable transformation with a non-vanishing 
Jacobian on the entire support of //(0[x). Then, 
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an inverse pdf fj'{X\'y) also exists and is related to independent random, variables: // (xij^i, 025X2) = 
//(^Ix) as fi{^i\Oi,02) and //(x2|0i, 02, xi) = /7(x2|0i, 02)- 

Then, there exists a conditional pdf fr (0i|02iXi ,X2l 
(38) /,,(A|y) = Ms-HX)\s-\y)) \dxs-'W\ ■ such that 



Similarly, when there exist an inverse pdf 
//(0i|02ix) and a dijjerentiable transformation 

(Si, S2, S) : (01, 02, X) > (Si o 0, S2 o 0, s o X) = 

(Ai,A2,Y) with a non-vanishing Jacohian on 
the support of fj{6i\62,:x.), there exists a pdf 
///(Ai|A2,x) such that 

/,,(Ai|A2,y) = 
fi{s-,\Xr)\s-,\X2),s-\y))\d^,s^\X,)\ . 



Proof. If fi{0,x.) exists, equation ([38]) follows 
from (fT6l) by substitutions Xi -^ 0, X2 -^ X, 
Yi -^ A, Y2 — > Y, si ^ s and S2 -^ s. Similarly, 
if //(0i,02,x) exists, (f39]l is deduced from (fT6l) by 
substitutions Xi — > 0i, X2 — > (02, X), Yi — > Ai, 
Y2 -^ (A2,Y), si -^ si and S2 -^ (s2,s). If, on the 
other hand, the joint pdf's fi{6,x) and //(0i,02,x) 
do not exist, equations (p8]l and (f39|) are definitions 
for //'(Ajy) and //'(Ai|A2,y), respectively, in the 
same way as /Y(yily2) was defined by (fTTj) . D 

Rule 2 (Product rule). Let there exist an in- 
verse pdf fi{9i, 021^) and the corresponding margi- 
nal pdf 

//(02|x)=/ //(0i,02|x)d™i01 . 

Then, for all 62 and x for which //(02|x) > 0, 

//(01,^2|x) 



(40) 



//(^l|02,x) 



//(02|X) 



holds uniquely (Lebesgue measure) UL-almost every- 
where on M™^ . 

Proof. The product rule ()40p follows immediately 
from (J13p by making substitutions xi — > 0i, X2 ^ 02 
and X3 — > X. D 

Rule 3 (Bayes' Theorem). Let a random vec- 
tor be partitioned into (0i,02,Xi, X2), let there 
exist pdf's /7(0i,xi|02,X2) and // (0i,X2|02,xi), 
let marginal pdf's // (xi|02,X2), // (x2|02,xi), 
//(0i|02,X2) and //(0i[02,xi) be non-vanishing, 
and let the components Xi and X2 of the partition be 



(41) 

//(0l|02,Xi,X2) 



/j(0l|02,X2) //(Xi|0i,02) 

//(xi|02,X2) 
//(0l|02,Xi)/7(x2|0l,02) 



/7(x2|02,Xi) 

//, on the other hand, a random vector is partitioned 
into (0,Xi,X2), 



//(0|xi,X2) 



(42) 



/j(0|x2)//(xi|0) 

//(Xi|x2) 
/H0|xi)/7(X2|0) 



//(X2|xi) 

holds true under analogous conditions. 

Proof. Equation (j4ip follows from (J14p by making 
substitutions y — > 0i, z — > 02, t — > xi and w — > X2, 
whereas (|42p is obtained from (J15p by substitutions 
y ^ 0, t ^ xi and w ^ X2. D 



are also referre d to as Bayes' 




: lLaplacd (11774)) or t he prin 



Equations 

Theorem (jBavei . 

ciple of inverse probability ( Jeffreys! (jl96ll ). §1.22, 
p. 28), written in terms of pdf's. In the equations, 
// (0i|02,xi,X2) and // (0|xi,X2) are called the pos- 
terior pd/'s, // (0i|02,xi^2) and // (0|xi^2) are the 
so-called prior pdf's, fi{^i,2\0i-,02) and //(xi^2|^) 
are the likelihood densities, while // (xi^2|^2,X2,i) 
and // (xi^2|x2,i) are the predictive pdf's. While the 
predictive pdf's are determined by the normalization 
condition on the posterior pdf's, e.g., 

//(xi,2|02,X2,l) = 
/ //(0l|02,X2,l) //(Xi,2|01,02) d^^Oi , 

the general form of the prior pdf's // (0i[02,xi^2) 
and // (0|xi 2) is prescribed by the following Propo- 
sition. 

Proposition 3. Suppose that conditions for 
Bayes ' Theorem ([^T]) are fulfilled: a random vector is 
partitioned into (0i,02,Xi, X2), there exist condi- 
tional pdf's // (0i,xi|02,X2) and // (0i,X2|02,xi), 
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the marginal pdf's // (xi|02>X2), // (x2|02>xi), 
// (01 1 02 5X2) and // (01 1 02 5X1) are positive, and the 
components Xi and X2 0/ i/ie partition are inde- 
pendent random variables with identical probability 
distribution. In addition, let V© = (Vq^, V©2) stand 
for the space of the parameter @ = (0i,02) and 
let y0i(xi,2,02L= {Oi G V@, : //(xi,2|0i, 02) > 0}. 
Then, for 0i E V0i(xi,2,02); 
(43) 

f (f) \f) ^ ^- '^^QilQa^^i'^g) \Q a \ 

j/(0l|02,xi,2j - ;^ 5 — r // (Xi,2|01,02J 

'ni,&i\e2\'^i,2,t^2,) 



//(0i|02,xi,2) and //(0|xi,2): 



is i/ie most general form of the pdf's fi (0i|02,xi^2)- 
Similarly, when a random vector is partitioned into 
(0,Xi, X2), the conditions for Bayes' Theorem 
(Ii2]) are fulfilled and 6 G V®(xi,2), V©(xi,2) = 
{0ey0:/7(xi,2|0)>O}, 



(44) /, (0|xi,2) = 



Cj,e(^) 



??/,©(xi,2) 



//(Xl,2|0) 



is the most general form of the pdf's fi (0|xi^2)- The 
functions C,jQ-^\e^{d 1^,62) and Ci,&{^) 'in equations 
and dM]) are called the consistency factors. 



Domains of // (0i|02)Xi.2) and // (0|xi^2) are 
extended beyond the supports Vxj(0i,02) = 
^X2(^i,^2) on which // (xi,2|0i,02) and // (xi,2l^) 
are positive by defining 

f (f) \ft ^ \- '^^Qi|Q2(^i'^2) \Q a \ 

fl (0l|02,Xi,2J = z ^ r//(Xl,2|01,02J 

^/,0i|6>2 1^1,2) '^2,j 

for all xi^2 ^ Vxi.2(^i,^2) and 



//(^|X1,2) = 



C/,e(^) 

??/,©(xi,2) 



//(Xl,2|0) 



for all xi^2 ^ Vxi_2(^)- For the sake of sym- 
metry between the direct and the inverse prob- 
ability distributions, the domains of the inverse 
pdf's may be extended even further by defining 
//(0i|6'2,xi,2) = for (01,02) ^ Vei,02 and 
// (0|xi^2) = for ^ V©. In this way, the inverse 
probability distribution spaces {Vx.,'^:k, Pr) and 
(V02,x,S02,x,^r-) are also extended to (M'^,i3"",Pr) 
and (M^i , S^i , Pr) , respectively. Then, the normal- 
ization factors r]jQ^\g^{-x.i^2, (^2,) and r?/^©(xi^2) are 
determined by invoking normalization of the pdf's 



^7,0i|6»2(xi,2,^2,) = 
/ 0,0,1^2(^1,^2) //(Xl,2|01, 02) rf'"^^! 



and 



f?/,0(,Xi,2j 



0,0(0) /7(xi,2l0)(i™0 



Non- vanishing integrals (j36p and (j37p thus represent 
necessary conditions for normalizability ([5|) of the 
inverse pdf's // (0|xi,2) and // (0i|02,xi,2). 

For discrete random variables Xi and 
X2, the appropriate forms of the pdf's 
// (0i|02,xi,2) and // (0|xi,2) are obtained by 
replacing the likelihood densities // (xi,2|0i, 02) 
and // (xi,2|0) in (j43|) and (|44p with the probability 
mass functions p/ (xi,2|0i, 02) and p/(xi,2|0) that 
coincide with probability distributions for the points 
Xi,2 = xi,2 of a state space M" of the variables Xi 
and X2, given the realizations (0i,02) = (^1)^2) 
and = of the corresponding parameters. 



Remark 5. In equations (03]) and (gll), the 
pdf's // (0ij025Xi,2) and // (0|xi,2) are directly pro- 
portional to the pdf's // (xi,2|0i, 02) and // (xi,2|0) 
of the corresponding direct probability distribu- 
tions. This is very similar to equations (j4ip and 
(|42p of Bayes' Theorem with the posterior pdf's 
//(0i[02,xi,X2) and /7(0|xi,X2) being propor- 
tional to the likelihood densities // (xi,2|0i, 02) and 
//(xi,2|^)- But there is also a fundamental differ- 
ence between the equations of Bayes' Theorem and 
those of Proposition [3) while the proportionality co- 
efficients // (0i|025Xi,2) and // (0|xi,2) between the 
posterior pdf's and the likelihood densities in Bayes' 
Theorem are the prior pdf's, the consistency factors 
C/,01 102(^1' ^2) and Ci^&{0) that are proportional- 
ity coefficients between the inverse and the direct 
pdf's in (|43p and (|44p need not be congruent with all 
the properties of probability density functions and 
should therefore not be confused with the so-called 
non-informative prior pdf's /7(0i,2|^2,i) and fi{0) 
(see also Section [131 below). The properties of the 
consistency factors are extensively discussed in the 
next section. 
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4. THE CONSISTENCY FACTORS 

4.1 General properties of the consistency factors 

According to Proposition [3l for a consistent 
assignment of inverse probability distributions, 
the appropriate consistency factors Ci^&{9) and 
C/,©! 2|02 i(^i'^2) need be uniquely determined. In 
what follows, we discuss some of the properties of 
the consistency factors that will be invoked during 
their determination. 

Property 1 (Uniqueness). A consistency fac- 
tors Cl,&i9) can only be determined up to a factor 
Xi,&{^i,2) that is an arbitrary function o/xi^2- Also, 
C/,©i 2|02 i(^i' ^2) is determined only up to an arbi- 
trary multiplier x/,©i 2|e2,i(^i,2> ^2,1)- 

Proof Multiplying Ci,&iO) by xi,&{^i,2) re- 
sults in multiplying rjj ^q{xi^2) by the same fac- 
tor, such that the factor cancels in the ra- 
tio C/,©(^)/^/,0(xi,2)- Identical arguments ap- 
ply when C/,0i,2|02,i(^i'^2) is multiplied by 

X/,0i,2|02,l(-'^l'2,^2,l)- □ 

Property 2 (Sign). A consistency factor 
Ci,&{9) 'i'S either positive or negative on the pa- 
rameter space V©, and so is C/,0i 2I02 i(^i' ^2) on 

^©l,2|02,l- 

Proof. The normalization factors ?y7,©(xi^2) are 
either positive or negative, and the pdf's // (0|xi^2) 
and //(xi^2|^) are non-negative, such that Ci,&i9) 
must be of the same sign as ??/,©(xi^2)) i-e., either 
positive or negative for all G V©. The same holds 
true for r//^©^ ^le^ j(xi,2, ^2,1), // (^i,2|^2,i,xi,2), 
//(xi,2|0i,^2) and O^0^_2|02^(0i,02)- □ 

Property 3 (Transformations). Suppose that 
the premises of Proposition^^ are fulfilled such that 
pdf's fi{0\xi^2) dnd //(xi^2|^) are related accord- 
ing to (I44p . Let, in addition, (s,s) : (0,X) — > 
(s o 0,s o X) = (A, Y) be a differentiable transfor- 
mation with non-vanishing Jacobians \d\s{\)\ and 
|(9xi 2^(xi,2)| for all 6 and xi^2 for which fj (xi^2|^) 
is positive. Then, the consistency and the normaliza- 
tion factors that relate ff (A|yi^2) o-'nd fji (yi,2|A) 
read 

(45) C/',a(A) = x/',AO,e[s-'(A)] |aAS-^(A)| 



and 
(46) 

??/',A(yi,2) =X/',A??/,0[s ^(yi,2)] l^yi^aS ^(yi,2)| • 
Similarly, for fj (0i,2|02,i>xi,2) and fi (xi,2|^i, ^2), 
(47) 

C/',Ai,2|A2,i('^l''^2) = X/',Ai,2|A2,i X 
C/,©i,2ls2-j(A2,i)[^r^(^l)>S^^(^2)]|5Ai,2Si;2(Al,2)| 

and 

(48) 

'?/',Ai,2|A2,i(yi,2, A2,i) = X/',Ai,2|A2,i X 
^/,0i,2|s-j(A2,i)[^"^(yi'2),S^l(A2,l)]|ay,_2S-Hyi,2)| 

are the transformations of the consistency and the 
normalization factors that are induced by the trans- 
formations (si,S2,s) : (01, ©2, X) > (siO0,S2O 

0,s o X) = (Ai,A2,Y) of the random variable 
(0i,02,X). 



Proof. Combining equations (jlSp and (|39p results 



m 



//'(-^l,2|A2,l,yi,2) = 

^/,0i.2|s,-l(A2,i)[^r'(Ai),S2-^(A2)] |g;,,^,S^^(Ai,2)| 
^/,0i,2|s-j(A2,i)[^"^(yi'2),S^,}(A2,l)] |5yi,2S"Kyi,2)| 
//'(yi,2|Al,A2), 

which, when compared to the relation 

//'(Ai,2|A2,i,yi,2) = 

C/',Ai 2|A2.i('^l''^2) 



??/',Ai,2|A2,i(yi,2, A2,l) 



//'(yi,2|Al,A2) , 



implied by Proposition [21 yields (|47p and (|48p . In 
the same way, (|45p and (06]) are obtained if ()39p is 
replaced by ([M])- □ 



For invariant families / of direct probability distri- 
butions, equations (05|) and ([TTP reduce to functional 
equations 

(49) C/,0(^) = Xi,&ia)CiMSa'm \deg-\e)\ 

and 

(50) 

C/,0l,2|02,l(^l'^2) = X/,0i,2|02,l("') ^ 
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for the consistency factors Ci,&i^) and 
C/,©i 2|02 1 (^1' ^2)) respectively. It should be noticed 
that the usual multipliers X/,© and Xi,&i2\02i' ^P 
to which the two consistency factors are uniquely 
determined (Property [T]), may depend on the 
parameters a of the transformations (on the group 
elements a), i.e., the consistency factors for the 
parameters of invariant parametric families of 
direct probability distributions are to be relatively 
invariant under Q. 

Apart from the invariance of the consistency fac- 
tors, invariance of a family I of direct distributions 
under a group G also implies invariance of the family 
of the corresponding inverse distributions under the 
induced group G. Let, for example, I be an invariant 
parametric family of continuous direct probability 
distributions of a scalar random variable X, whose 
scalar parameter is denoted by Q. Then, according 
to dZl), 
(51) 

r FiCh{a-\e)\h{a-\x)) ■,g'a{9)>0 
\l-Fi{h{a-\e)\h{a-\x));g',{e)<0' 



Fr 



4.2 Invariance under discrete groups of 
transformations 

Under what circumstances functional equations 
(H9]) and ([501) lead to unique solutions C/,0(^) and 

Example 7 (Parity). Let a parametric family of 
continuous direct probability distributions be invari- 
ant under a discrete group Q of transformations ga ■ 

X — > aX with cja : Q — > a© being the correspond- 
ing transformations from the induced group, where 
the underlying group G consists of two elements, 
a = ±1. That is, the distributions from the consid- 
ered family have (positive) parity under simultane- 
ous inversions of the spaces of X and 0. By combin- 
ing Ci,e{e) = x/,e(_a) C/,ek-'(^)] and C/,ek-'(^)] = 

XI ,e{a) ,e{ga^\ga^ {9)]} and setting a = -1 we 
obtain Ci,e{0) = X/,e(-l)C/,e(-^) and C/,g(^) = 
[x/,e(-l)]'C/,e(^), such that [x/,e(-l)]' = L 
When inability of Q^q(9) to switch sign is in- 
voked (Property [2]), this further implies Cl,e{~9) = 
Ci,b{9)- That is, 0,e(^) must have positive parity 
under the inversion Q — > —Q, but apart from this, 
it can take any form and so in this case equation 
(j49]l does not lead to unique solution. 



It is not difficult to understand that this is a 
common feature of all solutions based on invari- 
ance of parametric families under discrete groups. 
If the symmetry group is discrete, the spaces of X 
and break up in intervals, the so-called funda- 
menta l regions or d omains of the group (jWigner 
(Il959l l. §19.1, p. 210: l.lavne^ (iooi), §10.9, p. 332), 
with no connections in terms of group transforma- 
tions within the points of the same interval. We are 
then free to choose the form of Ci,Q (^) in one of these 
intervals (e.g., we can choose Ci',e(^) for the positive 
values of 9 in the above example), hence the invari- 
ance of a family / under a discrete group G alone 
does not lead to a unique form of the corresponding 
consistency factor. The argument applies, for exam- 
ple, for all parametric families of discrete probability 
distributions. 

4.3 Consistency factors and invariance under Lie 
groups 

Let Q = {ga ■ M — > M ; a S G} be a group 
and G be a one-dimensional Lie group. Then, ac- 
cording to Proposition [21 on the subspace Vx ^ Vx 
with non- vanishing derivative (I27p , every ^-invariant 
parametric family / of continuous direct probability 
distributions is necessarily isomorphic to a location- 
scale family /' with the realization o" = 1 of the scale 
parameter O2 . Since the fundamental domain of the 
group G of translations on the real axis consists of a 
single point, the space of all possible realizations of 
a location parameter is a homogenous space for the 
group (i.e., the space is said to be a single G -orbit). 

The implications of Proposition [2] may be ex- 
tended to the subspaces Vx — Vx'- 

Proposition 4. Let G = {ga ■ a e G} be a 
group of transformations ga : M — > M and G be 
a one- dimensional Lie group. Suppose, in addition, 
that a parametric family L of continuous direct prob- 
ability distributions for a scalar random variable X 
is G -invariant, that the action ofG onM is not iden- 
tically trivial on entire Vx, and that the correspond- 
ing cdf's Fi{x\X) are differentiable in A. Then, for a 
realization x G Vx — Vx C Vx with vanishing deriva- 
tive ()27p . the inverse probability distribution whose 
cdf Fi{\\x) is differentiable inx, cannot be assigned. 
(Existence of derivatives dxFi{x\\) and d\Fi{\\x) is 
assured by Definition\5i) 
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Example 8. Let I^ = {-P/.(^,o-) : (/">cr) ^ 
(/Lt,M''')} be a sub-family of a continuous location- 
scale family I that corresponds to the value of the 
location parameter Bi being fixed to ;U. By trans- 
formation X — > X — fj, = Y, every cdf Fj^ {xln, a) 
from I^ is reduced to 



^/' iy\f^, (^) = Fu {y + mIm, 0-) 



$ 



where y = x — /i. The probability distribution for the 
random variable Y thus belongs to the family /' that 
is invariant under transformations Qa '■ Y — > aY 
and Qa '■ 62 — > ^©2 for all a G R"*". Since the deriva- 
tive dah{a~^,y)\^^^ = y vanishes for y = 0, the in- 
verse probability distribution for the scale parameter 
62 given y = (or, equivalently, given x = fi) does 
not exist. 

In order to assign an inverse probability distri- 
bution to a scalar parameter of a family that is 
invariant under a group Q that is underlain by 
a one dimensional Lie group G it therefore suf- 
fices to determine the consistency factor C/,ei (^) = 
Ci,ei\(7=i{t^T^)^ which can subsequently be trans- 
formed, by means of (|45l) . to the corresponding 
consistency factor C/',a(^) for the original param- 
eter A. A location-scale family /o-=i = {i'?'/',(^,(T) '■ 
(/i, o") G (M, 1)} of continuous direct probability dis- 
tributions with the fixed value a = 1 of the scale 
parameter is a subset of the location-scale family 
/ = {Pri(n^a) '■ (^)^) G R X M^} that is invariant 
under the group Q (|25]) . Given a location-scale fam- 
ily /, the functional equation (j49p for the consistency 
factor C/ ei|cr(/^) cr) therefore reduces to 
(52) 
Ci,ei\a{tJ', 0-) = h{ai,a2) C/,ei|a[(M - ai)/o2, 0-702] , 



/U, ai S M and a, 02 G 
X/,ei|a(ai,a2)/a2. 



where /i (01,02) 



Lemma 4. The solution C/,ei|o-(^)0") of equation 
()52p is a function of a alone, say Q{a). 

Since C/,ei|cr(/^) o") is uniquely determined only up 
to a factor X/, 61 |o- (2^1,2, f^") (Property [1|), i7((T) may 
be, without loss of generality, set to unity, such that 



regardless the explicit family / of direct probability 
distributions, as well as the realization a of the scale 
parameter. 

By using the same arguments as for C7,eilo-(A*) o") 
we find that a consistency factor Ci,02\i^il^^ '^) i^ ^^^^ 
a function of u only, say (j^Q^{a) = C/,e2|A»(/^' ^)' '^^^ 
inverse probability distribution for the scale param- 
eter 02, given Qi = fi and Xi = xi = fi, does 
not exist (Example [8]) , while for xi ^ fi the pdf 
// {a\fj,,xi) can be expressed in terms of //± {xi\fi, a) 
(Section [12D: 



fi{a\fi,xi) 



^/±,e2|M(^i'/^) 



where 77/± ^021^(^1'/^) = ^/.Oal^ (^i'^) A±- By equa- 
tion ([22]) . every pdf /j-± (xi|//, cr) is reducible to 

/j±'(yi|Ai, A2 = 1), such that 



//' (A1IA2 = l,yi) 



Cj',Ai(Ai)/j±' (j/i|Ai,A2 = 1) 
^7-±',AilA2=i(yi>-^2 = 1) 



holds true and C/',Ai(-^i) = C/',Ai|A2=i(^i. -^2) = 1, 
where yi = ln{ib(xi — fJ-)} and Ai = Incr = s{a). 
Since, according to equation (|15|) . 

c/',A,(Ai) = c/,02r'(Ai)]irHAi)]'i 



must also hold, 

(54) 



-1 



C7,e2|/i(^)f^) =^ 



is the general form of the consistency factor 
C7,e2|/i(/^'''')' again regardless the explicit location- 
scale family / of direct probability distributions and 
the realization fi of the location parameter. 

According to Proposition [3l an inverse pdf 
// (^, cr|xi,X2) for the parameters @i and O2 of a 
location-scale family / must be expressible as 



fi{fi,a\xi,X2) 



Cl,&{lJ;Cr) fl{xi,X2\tJ.,a) 
r]i,& {xi,X2) 



For the same reasons as C/,ei|(T(Mi ^r) (Lemma S]), 
C/,0(/^)O") must also be a function of a alone, say 
H((t), while the product rule ()40p implies factoriz- 
ability of // (/i,cr|xi,X2), 



(53) 



Cl,ei\Al^,cr) = 1 



(55) //(^,cr|xi,X2) = fiicr\lJ;Xi,X2) fiip\xi,X2), 
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where, according to Bayes' Theorem dHJ, 
fi{a\fi,xi) fi(x2\fi,a) 



fi{a\n,xi,X2) 



Hence, 



^ cr 



fl{x2\fI,Xi) 

a^^ fiifJ-\xi,X2) 



Vi,&ixi,X2) rjj^Q^i^{xi,^i)fi{x2\n,xi) 
must hold, finahy implying 
(56) H(a) = C/,e(/^,cT)=^~' . 

The findings of the present subsection can thus be 
recapitulated as follows: 

Proposition 5. The consistency factors 
C7,ei|a(/^,o-), C/.ealAtC^iO-) and Ci^@{fi,a) for the 
parameters of location-scale families of continuous 
direct probability distributions read Ci,Qi\a{l^i'^) — ^ 
andC7,e2|M(/^'^) =C/,e(/^,cr) =0--^ 

4.4 On integrability and on uniqueness of the 
consistency factors 

It is easily verified that normalizability of pdf's 
(|2ip from location-scale families guarantees also nor- 
malizability (integrability) of all the pdf's that were 
involved in the foregoing derivations of the consis- 
tency factors. No requirement concerning integrabil- 
ity, however, has ever been imposed to consistency 
factors themselves. Moreover, it is evident that con- 
sistency factors C/,ei|o-(M)0") ([53]) . defined on the en- 
tire real axis, are not integrable, implying that none 
of the consistency factors for scalar parameters of 
parametric families that are invariant under the ac- 
tion of a one-dimensional Lie group, is integrable. 

Let 0,®(^) be a non- integrable consistency factor 
for a parameter from a family / of continuous di- 
rect probability distributions. Suppose for a moment 
that apart from the conditional pdf's fj{x\6) and 
fi{6\x), there also exist the non-informative prior 
pdf fi{0) and the joint pdf fi{0, x). Then, there ex- 
ists an u ncond i tional predictive pdf //(x) (see, for 
example, [Sh3(ll999|), §4.1.1, Theorem 4.1, p. 194), 
such that 



(57) 



//(^|x) 



fi{e)fi{-K.\G) 
//(x) 



But apart from Bayes' Theorem (j57p . //(0|x) is also 
subjected to Proposition [3l implying that fi{9) and 
Cl.&i^) s-re equal up to an arbitrary multiplication 
constant. Since then fi{9) is not integrable, the non- 
informative pdf fi{9) does not exist, and conse- 
quently, neither do exist fi{9,x.) and the underlying 
probability space (0,S,P). The pdf's //(x|0) and 
fi{9\x) therefore represent an extension of the con- 
cept of the conditional probability distribution that 
was introduced in Subsection 12. 1[ 

Since every consistency factor is determined only 
up to an arbitrary multiplicative factor (Property 
[1]), infinitely many different consistency factors for 
a parameter from a particular parametric family ex- 
ist. Nevertheless, unlike non-unique non- informative 
prior probability distributions (recall the assertions 
quoted in the introductory remarks), for a scalar 
parameter of a family of direct probability distri- 
butions whose invariance is associated to a one- 
dimensional Lie group, for example, the consistency 
factors are unique in that they all lead to the same 
inverse probability distribution. 

4.5 Discussion 

Above, the consistency factors were deduced ex- 
clusively by presuming existence of the inverse prob- 
ability distributions and by making use of the in- 
variance of the families of direct probability distri- 
butions that is related to Lie groups. The resulting 
set of the families with possible probabilistic para- 
metric inference is limited: for example, for scalar 
random variables X and scalar parameters Q the 
probabilistic parametric inference is in this way re- 
stricted to location parameters (or to parameters 
that are reducible to location parameters by one- 
to-one transformations). On the other hand, sev- 
eral principles were proposed for determination of 
the non- informative prior probability distributions. 
Here, applicability of these principles for determina- 
tion of the consistency factors is investigated in order 
to extend the domain of the probabilistic parametric 
inference. 

For example, if adapted for det ermination o f con- 
sistency factors, Bayes' Postulate (lBayesl . ll763l ). also 
referred to as th e Laplace Principle of Insufficient 
Reason (JLaplacd ()l886. ). p. XVII), suggests that all 
consistency factors should be uniform. Clearly, this 
is inadmissible since in general the constant consis- 
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tency factors contradict expressions (|15|) and ([^7j) 
for transformations of the consistency factors under 
reparameterizations. 

A sophisticated version of the Principle of Insuffi- 
cient Reason is referred to as the Principle of Max- 
imum Entropy. I n our context, the information en- 
tropy (jShannonI (119481 ) , § 6) reads 



S = 



Ve 



C7,eWln{C/,eW}^e, 



while the Pri nciple of Maximum Entropy states 
( JaynesI (J2003l ). § 11.3, pp. 350) that the consistency 
factor which maximizes the entropy represents the 
most honest description of what we know about the 
value of the inferred parameter. For compact param- 
eter spaces Ve for which the above integral exists, 
the principle again results in constant consistency 
factors (j^q(6) = e~^. The factors are then flawed 
in the sar ne way as the factors implied by Bayes' 
Postulate. Ijavned (|200l § 12.3, pp. 374-377) argues 
that the above expression for the entropy is inappro- 
priate since it is not invariant under reparameteriza- 
tion and proposes a Kullback-Leibler divergence (also 
called relative entropy) to replace it: 



S = 



Ve 



C/,e(^) In 



Cj,e(g) 
m{e) 



d9 



where m{9) is the reference measure function. Due 
to the unknown form of the latter, however, max- 
imization of the relative entropy does not lead to 
unique consistency factors. 

If Jeffreys ' general rule is applied Ijeffrevsl (j 19461 ) , 
the consistency factors are determined via the de- 
terminant of the Fisher information matrix X7^0(0), 

Ci,&{^) ex wdet [X/^0(0)], where the elements of the 



matrix are given by 



The obtained consistency factors satisfy require- 
ments (j45p and (j47p for transformations of the fac- 
tors under reparameterization, but are flawed in an- 
other way. Let, for example, a probability distri- 
bution N{iJ,, a) for a random varia ble X belong to 
the ri ormal (or Gaussian) family (jStuart and Qrd 
(|200d ^. §5.36, p. 191). Then, Jeffrey's general rule 



yields the consistency factors C/,0i|o-(/^)O') ^ 1; 
C7,e2|/i(/^>^) °^ ^~^ ^nd (j^Q{fi,a) oc cr~^, such that 
the resulting inverse probability distributions violate 
the product rule (j55]) . 

A mod i ficatio n of Jeffreys' general rule by 
Bernardo! ( 19791 ) called the reference prior ap- 
proac h leads to viol ations of the same product 
rule (JBernardol (|l979l l. §3.3, pp. 118-119). Also, let 
Xi, . . . ,Xn be independent random variables with 
identical probability distribution N{p,a). Since the 
normal family is a location-scale family of continu- 
ous distributions, the consistency factor ([56]) yields 
a unique posterior pdf // (s^^(A,(7)|x) for the pa- 
rameter (01, ©2) of the distribution, whereas the 
posterior pdf for (A, ©2) = s o (9i,02), A = 
0i/025 is obtained according to ([^5]) (Property [3]), 
fp{X,a\^) = /,(s-1(A,<t)|x) |a(;,,,)S-i(A,a)| = 
o"// (s~^(A,(7)|x). A unique //'(A,(t[x) further im- 
plies a unique marginal pdf ///(A[x), 



/•oo 

///(A|x) = / fp{\,a\yi)da 
Jo 

|-nA^/2|/ u^ expl-^ + rXu\ du, 



'0 

oc exp 



r = {J2 ^i) / y J2 ^"i ^ while the reference prior ap- 
proach leads to 



///(A|x) oc 

exp{-nA72} 
V/I + A72 "jo 



u" ^ exp 



--— + rXu > du 



teernardol (|l979l ). §5.1, pp. 122-123). In this way, 
since the two expressions for ///(A|x) are incompat- 
ible, inconsistency of the reference prior approach 
with the probabilistic parametric inference is once 
more demonstrated. 

Invariance theory has played an important role 
the theory of non-informative prio r proba- 



m 



bility distributions 
fisii); 



JavnesI (Il968l1 



(see 



for 
and 



example, iHartigan 

, _^^^^^^ 2003, Chapter 12, 

DP. 3 72-396: IPawid et all (|l973l ^. Se ction 2 p p. 195 - 
199: IVillegad t97l\) and 1981; lEatonl (jl989l l: 
Kass and WassermanI (|l996l ). §3.2, pp. 1347-1348). 
Functional equations (j49p and (j50p . for instance, cor- 
respond to what ha s been called t he Principle of 
Relative Invariance (JHartiganl . 1 19641 ) . Since the rel- 
ative invariance of the consistency factors is implied 
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immediately by the existence of the inverse proba- 
bihty distributions, the Principle of Relative Invari- 
ance, when applied to consistency factors, is redun- 
dant. Contrary to what is demonstrated above, it has 
also been believed that the Principle is insufficient to 
deter mine uniquely de fined priors (consistency fac- 
torsl (iHartiganl (|l964l ). §4, p 838 and ^ 10, p. 845: 



Villegasl (|l977h . §2, p. 454; 
(Il99fil ^. S3.2. p. 1348). 



Kass and Wasserman 



If multipliers Xl,&{(^) and X/,0i 2|02 1(^) ^^^ ^^^ 
to unity, equations (09]) and ([50]) l ead to inner (o r 
form in v arian t) consistency factors (jVillegad (| 19771 ) ; 
HarnevI (l2003l l. § 2.3, pp. 11-12 and § 6.3, pp. 53-54). 
Since, however, the form invariant consistency fac- 
tors for location-scale families, C/,ei|cr(/^)'7) oc 1, 
C7,e2|/i(/^' '^) ^ "^"^ ^^^ C/,©(/^)Cr) oc (T~^, lead to 
a violation of the product rule (I55p , the Principle of 
Form Invariance is inconsistent with the probabilis- 
tic parametric inference. 

When a parameter space V© of a family / is iden- 
tical to the symmetry group G of the family, every 
realization of the parameter identifies both an 
element of the family / and an element in G. If, in 
addition, the left action I : G x G — > G coincides 
with the composition of the group elements a and 
0, l(a, 0) = a o 6, the form invariant consistency 
factors C/,0(^) ^^^ called the left Haar consistency 
factors, where left Haar is due to the multiplication 
of by a from the left and due to the fact that 
i^i,H{d'^0) = C,i,@{9)(r^9 leads to the left-invariant 
Haar measure 

Jb 

on ^™ (Haar, 1933), i.e., I'l^nia o B) = vi,h{B) 
for all a E G and B € S"^, where a o B = 
{a o 6 : 6 G B}. Likewise, when k{a, 6) = 6 o a, the 
consistency factors that solve the functional equa- 
tion 0,0(61) = Ci^&[k{a-^,O)]\d0k{a-^,e)\ are called 
right Haar consistency factors on G. When G is a 
topological group, e.g., a Lie group, both the left 
and the right Haar measures (consistency factors) 
exist and are uni que, each up to a positive multipli- 
cation constant ( Nachbinl . Il965l ). but the two mea- 
sures (consistency factors) need not coincide. For the 
location-scale families, for example, k[a,{fi,a)] = 
{p + oio", a2cr) induces the right Haar consistency 
factor C,i@{^,a) = cj^"^ which, in contrast to the 



corresponding left Haar factor, does not lead to the 
violation of the product rule (|55p . 

Several additional desirable properties are estab- 
lished for the right Haar consistency factors (see 
Section 15.21 below for an exainple). Nevertheless, 
Eaton and SudderthI (|l99ill999l . l2002l l showed that 
unless the symmetry groups are further restricted to, 
for example, amenable groups, the probability dis- 
tributions based on the predictive pdf 's that are ob- 
tained by applications of the right Haar consistency 
factors (priors) are not generally consistent with the 
probability axioms. We cannot tell though, whether 
or not the right-invariant consistency factors based 
on the restricted groups extend the collection of fam- 
ilies for which the probabilistic parametric inference 
is possible. 

In summary, except possibly for the principle that 
identifies consistency factors with the right Haar fac- 
tors for the underlying symmetry group G, all the 
principles discussed are either redundant, inconsis- 
tent with the probabilistic parametric inference, or 
do not lead to unique consistency factors. 

5. INTERPRETATIONS OF PROBABILITY 
DISTRIBUTIONS 

Every axiomatic (abstract) theory admits, as is 
well known, of an unlimited number of concrete 
interpretations besides those from which it was 
derived. Thus we find applications in fields of 
science which have no relations to the concepts 
of random event and of probability in the precise 
meaning of these words. 

Kolmogorovl (|l933l ). Chapter 1, p. 1. 



5.1 Probability distributions, relative frequencies 
and degrees of belief 

So far, a mathematical theory of probabilistic 
parametric inference has been discussed. In the 
present section, however, two concepts of probabil- 
ity distributions are introduced that link the math- 
ematical theory to an external world of measurable 
phenomena: the concept of relative frequencies in re- 
peated trials, and the concept of degrees of belief in 
hypotheses or propositions (i.e., in statements that 
can be either true or false) concerning values of in- 
ferred parameters of parametric families. 

Suppose an experiment is repeated under identi- 
cal conditions, but the outcomes vary from one rep- 
etition of the experiment to another. If a numerical 
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characteristic assigned to the outcomes of the exper- 
iment follows no describable deterministic pattern, 
the experiment is called random experiment, the out- 
comes of the experiment are called random events, 
while the underlying process of such an experiment 
is called random process. Let random events be mu- 
tually independent. Then, within the frequency in- 
terpretation of probability distributions, the direct 
probability distribution for a random variable X, 
linked to the experiment, is assumed to coincide with 
the long term distribution of relative frequencies of 
particular outcomes of the experiment. 



F/(x|0) = lim 

A''-+oo 



iVx<> 

N 



where N is the total number of repetitions of the 
experiment and A'^x<x is the number of the repe- 
titions with outcomes whose numerical characteris- 
tic is less-or-equal to x. Henceforth, the frequency 
interpretation of direct probability distributions is 
assumed. 

Inverse probability distributions, on the other 
hand, are used to express one's degrees of belief that, 
given a (finite) recorded sequence xi,X2, . . . of real- 
izations of independent random variables Xi, X2, . . . 
with an identical probability distribution from a 
parametric family /, the so-called true value of the 
parameter of the family (i.e., the value of the 
parameter that uniquely determines the true lim- 
iting frequency distribution of the realizations) lies 
within a certain region of the parameter space. Sev- 
eral strong arguments exist for inverse probability 
distributions being the ideal for parametric infer- 
ences, like, for example, the so-call ed Dutch Book 



Theorem, emerging from th e work of iRamseyl (11931 



Chapter VII, p p. 15 6-1981. Ide Finettil (Il93ll . Il937l ). 
Shimom/ ( 1955) a nd lKemenvi ( 19551 ). and Cox's The- 



orem ( Coxl . 1 19461 ). For a c oncise review of the two 



Theorems see, for example. IParisI ( 1994 ). Chapter 3, 
pp. 19-33. 

While being identical objects from a mathemat- 
ical perspective, the direct and the inverse prob- 
ability distributions obviously have different inter- 
pretations. Contrary to the distribution of real- 
izations of random variables Xj, in most situa- 
tion the realization of a parameter - the inferred 
true value of - is unknown but fixed. Sev- 
eral authors overlooked this important difference 



between the frequency distributions and the dis- 
tributions of so meone's belie fs (se e, for example, 
LehmannI (Il986l'). S1.6. d 14: IShaol (|l999l ). §7.1.3, 
D.431: ICasella and Bereeil (120021'). S 7.2.3. p. 324 and 
§9.2.4, pp. 435-436; |Hamey| 1200^), §2.5, p. 18). It 
should be noticed, however, that the developed the- 
ory of probabilistic parametric inference still pro- 
vides verifiable predictions in terms of relative fre- 
quencies of confidence intervals, covering the true 
value of the parameter (see Section [521 below). The 
theory is then both operational and objective. 

5.2 Calibration 

Definition 16 (Confidence intervals). Let 
fl{6\x) be a pdf of a probability distribution for a 
scalar parameter @, V@ = {9a, Ob), given realization 
X of a scalar random variable X from a parametric 
family I. A confidence interval {0i{x),92{x)) C Ve 
is defined via the system of equations 



Pri{9a,ei)\x) 



fi{e\x)de = a 



and 



02 



Pri{{ei,e2)\x) = r fi{e\^)de = 6 , 

where 5 E [0, 1] and a G [0, 1 — (5]. The number 5 is 
called the probability content of the interval. 

Higher dimensional confidence regions, e.g., m- 
dimensional confidence rectangles (m > 2), for 
vector-parameters are defined in a similar way. 

Definition 17 (Calibration). Leixi,...,x„ be 
a set of realizations of independent continuous ran- 
dom variables Xi, . . . ,X„ from a parametric fam- 
ily I of direct probability distributions. The inverse 
probability distributions, assigned to the inferred pa- 
rameter of the family I, given realizations Xj, are 
called calibrated if, in the limit n — > 00, the cover- 
age of the corresponding confidence regions (i.e., the 
relative frequency of the regions that cover the true 
values of the inferred parameter) coincides with the 
probability content 5 of the region. 

Calibration of probability distributions for in- 
ferences about location and scale parameters is 
guaranteed by the fact that the consistency fac- 
tors C/,ei|<7(/^,cr), C/,e2|M(/^'^) a'^'i CiAl^,cr), deter- 
mined in Subsection 14.31 coincide with the right 
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Haar factors for the group M for summations, for 
the group M^ for multiphcations, and for th e grou p 
RxR"*" for operation s (I24D. respectively |Stein| ( 19651 ): 
Chang and Villegaa (|l986l ). That is to say, the re- 
sulting confidence regions coincide with the so-called 
classical confi dence regions, first propounded by 
NeymanI ( 19371 ). It should be noticed that this holds 
true even if the true value of the inferred parame- 
ter arbitrarily varies from realization of one random 
variable to another. 

It can further be shown that the consistency fac- 
tors for location and scale parameters, determined 
in Subsection 14.31 provide for a simple frequency in- 
terpretation of the predictive distributions. 

To relate probabilistic parametric inference to an- 
other concept " that of the fiducial inference - let 
Fi{x\X) be a cdf for a continuous one-dimensional 
random variable X that is either strictly increasing 
or strictly decreasing in a scalar parameter A. Then, 
a sufficient condition for an inverse probability dis- 
tribution to be calibrat ed - the so-called fiducial con- 
dition by iFishen (jl956l . § 3.6, p. 70) - reads: 



(58) 



fiiX\x) = \dxFi{x\X)\ 



Observe that for the inverse pdf's, assigned to loca- 
tion and scale parameters by using the consistency 
factors ([53]) and ([54]) . the condition ([58]) is satisfied. 
Also, it is easily shown that congruence with the 
fiducial condition is preserved under updating that 
is made in accordance with Bayes' Theorem. 

Conformity with the fiducial condition ([58p is in- 
variant under one-to-one transformations Y = so X 
and Q = s o A with non- vanishing derivatives s'{9): 

fr{e\y) = fds-Ho)\s-Hy))\[s-\0)]'\ 



\d,-^^e)Fiis-\e)\s-\y))[s-\9)]'\ 



and therefore 



fi>{0\y) = \deFj,{9\y)\ , 
where the last equality is due to equation 



Fp{e\y) 



Fj{s-H9)\s-\y)) ; s'{9) > 
1 - Fi{s'Hd)\s-\y)) ; s'{9)<0 



combining equation (I44|) from Proposition [3] with the 
above fiducial condition we obtain: 

(59) 0,a(A) d,Fi{x\X) ± r]i,A{x) dxFi{x\X) = , 

where the upper (lower) sign stands for cdf's which 
are strictly decreasing (increasing) in A. By defining 
H{x, A) = s{x) =p s(A), with s(x) and s{X) being re- 
lated to Ci,aW ^^^ VI, A^^) ^^ s'{x) = rji^\{x) and 
s'{X) = Ci, AiX), functional equation ([59]) can be re- 
duced to ([30]) . Recall that the most general solution 
Fi{x\X) of equation ([30]) implies existence of a cdf 
Ff'{y\iJ,) for Y = s o X from a location-scale family 
/' with n = ±s(A) being a realization of the location 
parameter 0i = s o A, whereas the scale parameter 
02 of the family /' is set to 1. That is, the fiducial 
condition ([58p and the requirement (I44p of Proposi- 
tion [3] combined imply reducibilit y of an inferred pa- 
rameter to a location parameter. ( Lindleyl ( 19581 ) ob- 
tained the same result by combining the calibration 
condition (I58|) and Bayes' Theorem ([57p .) For scalar 
parameters, the consistency factors that were de- 
duced on the basis of invariance of parametric fam- 
ilies under the action of one-dimensional Lie groups 
are therefore the only consistency factors for which 
the resulting inverse probability distributions satisfy 
the fiducial condition (1581). 



6. CONCLUSIONS 

For scalar parameters, invariance of a paramet- 
ric family of direct probability distributions under 
the action of a one-dimensional Lie group leads to 
unique inverse probability distributions. The con- 
cept of invariance is equivalent to the concept of 
fiducial distributions, combined with implications of 
Proposition [3l both concepts lead to identical inverse 
distributions and are applicable under the same con- 
ditiori s . Wh en t his is observed, the original idea of 
BavesI (| 17631 ) and lLaplacd ( 18861 ) of embedding para- 
metric inference in the framework of probability the- 
ory becomes perfectly compatible with the concept 



that follows immediately from the definition of the 
inverse cdf's and from equation ([7]). In addition, by 



of the classical confidence intervals ([Nevmani . Il937l ) 
and with the concept of the fiducial distributions 
([Fisherj . Il935l ). Therefore, provided that adherents 
of the Bayesian schools of parametric inference are 
willing to give up the notion of non-informative prior 
probability distributions, while at the same time ad- 
herents of the frequentist schools are willing to adopt 
a broader concept of random variable that leads 
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to existence of inverse probability distributions, a 
reconciliations between different paradigms can be 
reached, pro bably the same kind of reconciliation 
that Kendall iKendalll (jl949i ) had in mind when he 
wrote: "Neither party can avoid ideas of the other in 
order to set up and justify a comprehensive theory." 

APPENDIX A: PROOFS OF PROPOSITIONS 
AND LEMMATA 

A.l Proof of Proposition [1] 

The left-hand side of ([TO]) can be rewritten as 

(60) 
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A. 2 Proof of Lemma [I 

Let r : R X G — > M, r{x,a) = l{a^^,x), be the 



^U 



Hu) 



(S) 



z-i(s) 



Ly-1 



{U)i^)dPi^) 



^Y-^u){^)'^z-^s){^)dPiuj) 



lu{y)ls{z)dPrxiy,z) 
/x(y,z)d"yrf'"z 



n vTB"! 



UxS 



UxS Jx(z) 



//i(z)/x(z)d'"z, 
Js 



U (z B^ and S G B"^, where B^ is a restriction of 
B"^ to Vz while 



Mz) ^ / 
Jv 



/x(y,z) 
u /x(z) 



d"y. 



In (j60p . the first equality follows from the definition 
of ?! 1 (5") (Definition [9]) , the third equality fol- 



'-Y-l{(7) 



l ows f rom the change of variables Theorem ([Dudley 
(|l989l ). §4.1, p. 92), whi le the l ast eq ualitv follows 
from Fubini's Theorem ( Bart id ( 19661 ) . Chapter 10, 
pp. 119-120). Inserting ([llh into the right-hand side 
of ([T0[) yields, on the other hand, 



/x(y|z)d"y 



/x(z) d™z = / kiz) f: 
Js 



x(z)d"^z. 



Let Si^2 ^ {z : h{z) ^ ^(z)}. Then, the equality 
of h{z) and k{z) Prx-almost everywhere on Vz fol- 
lows i mmediat e ly fro m Fatou's Lemma (see, for ex- 
ample, iBartld ( 19661 ). Chapter 4, Corollary 4.10 of 
Fatou's Lemma, pp. 34-35), while the equality of 
/x(y,z)//x(z) and /x(y|z) z^L-almost everywhere 
on M" is obtained in an analogous way. 



right action of G on M. Then, r(x, aob) = r[r{a, x), b\ 
holds true for all a,b ^ G and for all x S M. A 
differentiation of r{x,a o b) with respect to a thus 
yields 

daobr{x, aob) da{a ob) = d^(^^^a)'r[r{x, a), b] dar{x, a), 

which for b = a~^ reduces to 

dcr{x,c)\^^^da{aob)\f^^^^i = 
dr{x,a)r[r{x,a),b]\^^^-idar{x,a) , 

c = aob. The left-hand side of the above equation 
is zero due to the premise of the Lemma, 

dcr{x,c)\^^^ = dcl{c'^,x)\^^^ = . 

On the right-hand side, however, the first term, 

^r{x,a)r[r{x,a),b]\^^^^-^^ = dyl{a~^,y) = dyga-i{y) 

is non-vanishing for all admissible values of the in- 
dex a and for all real y = gaix) since differentiability 
of l{a,x) with respect to x for every a is assumed. 
Then, dar{x,a) = dal{a~'^,x) = is implied for all 
permissible a, i.e., ga^{x) is permitted to depend on 
x only, say ga{x) = h{x). When ge{x) = x is in- 
voked, this further means h{x) = x and the Lemma 
is proved. 

A. 3 Proof of Lemma [2 

Suppose there exists a realization Aq of A for 
which the partial derivative ([28p vanishes. Since the 
family I of direct distributions is invariant under 
Q, equation ([29[) applies which, when differentiated 
with respect to a and set afterwards a = e, yields 

d^Fi{x\X)dal{a-\x)l^^ 
= -dxFi{x\X)dJia'\X)l^^. 

The second term on right-hand side of the above 
equation vanishes for A = Aq, which implies 



dj{a ^,x)\^^ 







yxGVx ■ 



This means, according to Lemma [H that all transfor- 
mations ga ^ Q are trivial for all x S Vx(A), which 
is in direct contradiction with the initial premises, 
so that the proof is completed. 
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A. 4 Proof of Lemma O 

It is easily shown that every cdf Fj{x\X) of the 
form ()34p solves (I30p . In order to demonstrate that 
the cdf's of the form (j34p are also the only solu- 
tions of ([30]) . suppose for a moment that Fi{x\X) 



be independent of xi and X2, but may depend on 
Oi, Oi', and 6*2: 



qieuei',e2) 



k{xi^2, 01,62) 



can be written in terms of two independent vari- The function g(0i,0i',02) is factorizable, 
ables, H{x, A) §B) and K{x, A) = s{x) + s(A), 

C/, ©1102(^1 '^2) 



(61) 



F7(x|A) = $[//(x,A),K(x,A)] , 



C/,0i|02(^l'^2) 



where the functions s{x) and s(A) are defined ma such that 
dM]) and dMD. Inserting §^ into ([30]) yields 



2s'{x)s{\)dK^{H,K) = 
. 

Therefore, for s'{x), s'{X) ^ 0, dx^iH, K) must van- 
ish identically, such that the form p4p of ^/(xIA) 
is implied. If, on the other hand, any of s'{x) and 
s'(A) vanishes, H{x, A) and -^^(2;, A) cease to be inde- 
pendent, i.e., K{x,\) = K[i/(x,A)], such that (j3ip 
again holds true, but since in this case Fi{x\X) is 
either a function of x alone, a function of A alone, or 
a constant, such a solution is inadmissible for a cdf 
from a parametric family. 

A. 5 Proof of Proposition O 

According to the premises of the Proposition, a 
positive /7-(0i|02,xi,X2) exists and can be decom- 
posed according to ([^T|) . Let 0i' G V0j(xi_25^2) be 
another realization of 0i fulfilling the conditions of 
the Proposition, such that 



//(0;|6l2,Xi,X2) 



//(0/|^2,X2)//(xi|0/,02) 
//(xi|02,X2) 

//(0/|02,Xi)//(x2|0;,02) 
//(X2|02,xi) 



is also positive. Dividing the above equation with 
dH]) yields 



(62) 



/t(xl,0l^6>2) _ K(x2,6>l^6>2) 

k(xi, 01,02) k(x2,0i,02) 



k(xi,2,0i('),02) = //(0/')|02,Xi,2)///(xi,2|0l('),02). 

Clearly, in order to ensure equality in (j62p for all 
xi and X2 for which //(xi_2|0r ', ^2) > 0, the left- 
hand and the right-hand side of the equation must 



^7,©i|02(xi,2)^2 



q{e^,e^',e2) , 



k(xi,2,6>i',6>2) 

0,0x102 (^1' ^2) 

k(xi,2,01,02) 



which proves equation (|43p. while equation (|44p is 
proved in a similar way by invoking ()42p instead of 
ffl) 



A. 6 Proof of Proposition [4] 

Suppose for a moment that a pdf for 6, fi{6\x), 
can be assigned to 6* G Ve based on x € Vx — Vx 
for which partial derivative (f27ll vanishes. Since the 
family / of direct probability distributions is G- 
invariant, the distributions assigned to B are invari- 
ant under the induced group Q such that equation 
(fSTj) applies. When differentiated with respect to a 
and set afterwards a = e, ()5ip further implies 



d,Fi{e\x)dJ{a- 



-deFi{e\x)dal{a-\6)l 



for all G Ve- The left-hand side of the above equa- 
tion vanishes due to the premises, adopted at the be- 
ginning of the proof. Since, by Lemma O the second 
term on the right-hand side does not vanish any- 
where on Vg), d0Fj{9\x) = fi{9\x) must vanish for 
all G Ve, which is incompatible with the normal- 
ization requirement ^. Therefore, the assumed ex- 
istence of fj{6\x), based on x with vanishing deriva- 
tive (j27p . inevitably leads to inconsistencies and is 
thus ruled out. 

A. 7 Proof of Lemma [4] 



Equation (|52p holds true for all /i, ai G M and 
for all a, 02 G M^ . For oi = /u and 02 = o" we ob- 
tain h{fi,a) = C/,0i|<T(^,cr)/C/,0i|<7(0, 1), while set- 
ting ai = fj, and 02 = 1 reveals factorizability of 
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C/,ei|a(A^,o"): 
(63) C/,0i|a(^, 



(J 



C/,eik(0,l) 



By taking these findings into account, equation (j52p 
reduces to 

C/,ei|.(/^, 1) C/,ei|a(0, a) [C/,eik(0, 1)]' = 
C/,eik(«i>l)C/,0i|a(O,a2)x 

C/,ei|a[(At - ai)/a2, 1] C/,ei|a(0, cr/a2) , 

which for oi = and 02 = o" yields C/,ei|a-(/^) 1) = 
C/,ei|ff(/^/o", 1). Hence, C/.OilaC/^, 1) must be a con- 
stant, such that, according to ([55]) . C/,0i|cr(A*) cr) is a 
function of cr alone. 
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